33 research outputs found
Neural Distributed Compressor Discovers Binning
We consider lossy compression of an information source when the decoder has
lossless access to a correlated one. This setup, also known as the Wyner-Ziv
problem, is a special case of distributed source coding. To this day, practical
approaches for the Wyner-Ziv problem have neither been fully developed nor
heavily investigated. We propose a data-driven method based on machine learning
that leverages the universal function approximation capability of artificial
neural networks. We find that our neural network-based compression scheme,
based on variational vector quantization, recovers some principles of the
optimum theoretical solution of the Wyner-Ziv setup, such as binning in the
source space as well as optimal combination of the quantization index and side
information, for exemplary sources. These behaviors emerge although no
structure exploiting knowledge of the source distributions was imposed. Binning
is a widely used tool in information theoretic proofs and methods, and to our
knowledge, this is the first time it has been explicitly observed to emerge
from data-driven learning.Comment: draft of a journal version of our previous ISIT 2023 paper (available
at: arXiv:2305.04380). arXiv admin note: substantial text overlap with
arXiv:2305.0438
The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric
We show how perceptual embeddings of the visual system can be constructed at
inference-time with no training data or deep neural network features. Our
perceptual embeddings are solutions to a weighted least squares (WLS) problem,
defined at the pixel-level, and solved at inference-time, that can capture
global and local image characteristics. The distance in embedding space is used
to define a perceptual similarity metric which we call LASI: Linear
Autoregressive Similarity Index. Experiments on full-reference image quality
assessment datasets show LASI performs competitively with learned deep feature
based methods like LPIPS (Zhang et al., 2018) and PIM (Bhardwaj et al., 2020),
at a similar computational cost to hand-crafted methods such as MS-SSIM (Wang
et al., 2003). We found that increasing the dimensionality of the embedding
space consistently reduces the WLS loss while increasing performance on
perceptual tasks, at the cost of increasing the computational complexity. LASI
is fully differentiable, scales cubically with the number of embedding
dimensions, and can be parallelized at the pixel-level. A Maximum
Differentiation (MAD) competition (Wang & Simoncelli, 2008) between LASI and
LPIPS shows that both methods are capable of finding failure points for the
other, suggesting these metrics can be combined
Median Trilateral Loop Filter for Depth Map Video Coding
Abstract-Emerging extensions to conventional stereo video technologies like 3D Video require to add depth information to 2D video data. This supplementary data needs to be coded efficiently and transmitted to the receiver where arbitrary viewpoints are generated by using this additional information. The depth maps are characterized by piecewise smooth regions, which are bounded by sharp edges describing depth discontinuities along object boundaries. Preserving these characteristics and especially depth discontinuities is a crucial requirement for depth map coding. When coding depth maps by means of a conventional hybrid video coder, ringing artifacts are introduced along the sharp edges and result in quality degradation when using the reconstructed depth maps for view synthesis. To reduce these ringing artifacts and also to better align object boundaries in video and depth data, a new in-loop filter is proposed, which reconstructs the described characteristics of depth maps